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A METHOD AND APPARATUS FOR DEEP PACKET PROCESSING 

Field of the Invent ion, 

The present invention generally relates to 
telecommunication packet . processing and particularly relates 
5 to a method for flexible parsing and searching of information 
in the packet including the packet payload. 



10 



15 



Background of the Invention 

Packet classification is a function implemented in 
networking equipment such as routers and switches, that 
extracts information from an incoming packet (this is called 
parsing) and uses this to search a data base with rules. If a 
matching rule is found, then the processing of that packet 
will be based on data associated with that rule. The parsed 
information, the rules, and the way the rules are searched are 
dependent on the application. 
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For example, with an IP forwarding application, the 
packet classification consists of parsing the IP destination 
address from the IP header, which is then used to search a 
routing table according to a longest-matching prefix search. 

~5 The routing table entry that results rrom this search~pr^V±des- 
the address of the next hop to which the packet will be 
forwarded- Another example is a firewall application, in which 
several fields (e.g-, IP source and destination addresses, TCP 
port numbers, PROT byte) are parsed from the packet header, 

10 and are then used to search the highest-priority matching rule 
from a set of firewall rules. Data associated with this 
firewall rule will then indicate whether the packet will be 
permitted or denied access through the firewall. 

15 Conventional applications, such as the two described 

examples, have in common that the information is parsed from 
well-known fields at fixed locations within the packet headers 
(up to layer 4) , which have fixed and relatively small sizes 
(typically up to 32 bits) . Furthermore, the classification can 

20 be performed in two distinct phases: First the information is 
parsed from the packet header. Next, the parsed information is 
used to search a data base. 

Web-server load balancing, intrusion detection and virus 
scanning are examples of important emerging applications that 
25 require more advanced packet classification capabilities, than 
as required by "conventional" applications as described above* 
These more advanced capabilities relate specifically to the 
following aspects: 

1) Besides information from the packet header, also 
30 information from the packet payload needs to be inspected. 

2) The location and the amount of information that has to be 
inspected within the payload is not always known in advance 
and can for several applications only be determined during the 
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classification operation. Some applications require the use of • 
pattern-matching and regular-expression-based searches for 
locating information within the payload. 

3 ) The amount of information involved in the packet 
classification can consist of up to multiple hundreds of 
bytes . 

Examples of information that has to be inspected within 
the payload for a web-server load balancer application, are 
URLs, cookies, and SSL identifiers. Another example is a set 
of known character strings related to viruses that are checked 
for by a virus -scan application. 

From the above it can be understood that the packet 
classification now becomes more complex in the following two 
respects : 

1) The parsing operation becomes more difficult for 
applications in which the location and amount of information 
that needs to be parsed is not known in advance, and for 
applications for which a large amount of information needs to 
be parsed. 

2) For certain applications the two distinct phases of parsing 
and searching cannot be used, but instead it is necessary to 
repeat parsing and searching in alternating steps or combine 
the two steps (e.g., pattern-matches). 

In addition to the functional requirements outlined 
above, packet classification must be performed on the fly on 
incoming packets (this is called wire-speed packet 
classification) for typical link speeds between 1Gb/ sec and 
10Gb/ sec today. A second requirement is that the data 
structures used for the packet classification should be 
organized such that a minimum amount of memory is needed for 
storing them, in order to minize the costs. A third 
requirement is the support for fast updates of the rule set, 
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as required by the dynamic nature of several new .applications 
(e.g., web-server load balancing) . 

The standard solution for realizing- a flexible parse 

fc unction suTEaBTe for — a^vancred — packet: crtasrs±f±ctttrtoii as* 

5 described above, is a programmable state machine. The concept 
and disadvantages of a prior-art implementation of a 
programmable state machine will now be illustrated using Fig. 
1 and Fig. 2 

Fig.l shows a state diagram for parsing two patterns 
10 *121h" (h means hexadecimal) and "ABh" from an input stream of 
4-bit characters. There are 6 possible states (SO, SI, S2 , S3, 
S4, S5) represented by circles, the arrows represent the state 
transitions. Nodes S3 and S5 are end states. 

Fig. 2 shows a prior-art implementation of a programmable 
15 state machine for the state diagram of Fig. 1, which requires 
one memory access per state transition. In this example the 
states are assigned the following 3 -bit state vectors: 

50 - 000b S2 - 010b S4 - 100b 

51 - 001b S3 - 011b S5 - 101b 

20 

In Fig. 2 the current state (3 bits) concatenated with 
the 4-bit input value is used as an offset (address) into a 
table containing the next state for each possible combination 
of a current state and input value, resulting in a total of 

25 2 (3+4> = 2 7 = 128 table entries. Disadvantages of this approach 
are (1) the inefficient use of storage (e.g., there are 128 
table entries in Fig. 2 of which many contain the same next 
state) and (2) the large number of table entries that have to 
be written while "programming" the table for the given state 

30 diagram, resulting in a long construction (update) time. 

In the US patent of Solidum, US06167 047, a programmable state 
machine is disclosed for packet classification. The Solidum 
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patent approach requires at most one memory access per state 
transition, but requires less memory than the implementation 
concept shown in Fig. 2. A disadvantage of the Solidum 
approach is that it requires significantly more complex 
5 hardware: at least one processor, one program memory and one 
separate stack memory . This results in increased chip-area 
costs and increased power- consumption. 

A prior-art solution for realizing a flexible search 
function suitable for advanced packet classification as 
10 described above, is a tree structure. One example is a 
Patricia tree as described in D. R. Morrison original paper 
"Patricia - Practical Algorithm to Retrieve Information Coded 
in Alphanumeric", Journal of the ACM, Vol. 15, 1968. 

A disadvantage of the prior-art is that no solutions 
15 exist that can support both efficient parsing and efficient 
searching. A programmable state machine cannot efficiently 
implement a search since the state space is typically too 
large, resulting in significant memory requirements and/or 
complex logic to determine the next state. On the other hand, 
20 a tree structure cannot implement parsing as efficiently as a 
programmable state machine, especially because the latter can 
more efficiently handle typical state transitions that are 
more complex than the branch functions which occur in a 
typical tree search. Therefore, implementing parsing using a 
25 tree structure, would require many nodes and therefore results 
in significant memory requirements. 

Consequently, a disadvantage of the prior-art is that 
different hardware has to be used to implement the parsing and 
searching. A second disadvantage is that this makes it more 
30 difficult to realize a more advanced packet classification 
function, that supports the alternating use of parsing and 
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searching or the combination of parsing and searching as 
described above. 



Summary of the Invention 



It is therefore an object of the present invention to 
5 provide a deep packet processing method and apparatus which 
could sustain high speed while performing parsing and 
searching operations. 

It is another object of the invention to reduce storage 
requirements and hardware- implementation complexity. 

10 These objects are reached by the use of the method for 

creating the data structure of a programmable state machine 
according to claims 1 to 3 . The data structure comprises 
state-transition rules of a programmable state machine for 
parsing. Storage requirements are reduced by use of an 

15 algorithm known as BaRT and by distributing states over 
multiple state space implemented using separate 
state- transition rule tables. The parsing method of claim 4 
takes advantage of the data structure and can be performed at 
wire-speed. 



The searching method of claim 5 usse the same data 
structure and can be performed at wire-speed as well. The 
parsing and searching methods can be performed alternatively 
or combined, still at wire-speed, according to claim 6 and 7 . 
The same hardware, according to claim 8, can be used for 
parsing and searching. Because of the reduced memory 
requirements, examples of hardware solutions embodying the 
invention can be implemented in on-chip memory according to 
claim 9. A computer program can advantageously implement 
examples of the present invention according to claim 10. 
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With the use of a compression technique already disclosed 
for a lookup scheme implementation, a preferred embodiment of 
the invention provides a powerful data structure allowing a 
combination of an efficient programmable state machine for 
parsing and an efficient tree based searching mechanism* 

In a particularly preferred embodiment of the invention, 
the same hardware can be used for parsing and searching. 

Brief Description of the Drawings 

Preferred embodiments of the present invention will now 
be described, by way of example only, with reference to the 
accompanying discussions, in which: 

FIG. 1 illustrates a state diagram for parsing patterns' 
from an input stream of 4-bit characters; 

FIG. 2 shows a prior art programmable state machine 
implementation for the state diagram of Fig. 1; 

Fig. 3 illustrates 6 transition rule entries according to 
a first preferred embodiment, which implement the state 
diagram of Fig. 1, and are stored in one register or one 
memory location; 

Fig. 4 illustrates a BaRT- compressed state-transition 
rule table according to the first preferred embodiment, which 
implements the state diagram of Fig. 1, and is organized such 
that a maximum of N=4 transition rule entries are stored in 
each memory location; 

Fig. 5 illustrates the fields within a transition rule 
entry according to a second preferred embodiment; 
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Fig, 6 illustrates two BaRT-compressed state- transition 
rule tables according to the . second preferred embodiment, 
which implement the state diagram of Fig. 1, and are organized 
such that a maximum of N=4 transition rule entries are stored 
5 in each memory location; 

Fig. 7 is the flow chart describing the creation of the 
data structure according to the second preferred embodiment; 

Fig. 8 is the flow chart describing the creation of the 
10 compressed state -trans it ion rule table which is one step of 
the creation of the data structure as described in Fig. 7; 

Fig. 9 is the flow chart describing the calculation of an 
index mask for distributing transition rule entries over a 
minimum number of entry-blocks, which is one step of the 
15 creation of the compressed state-transition rule table as 
described in Fig. 8; 

Fig. 10 illustrates a data structure consisting of three 
BaRT-compressed tables according to the second preferred 
embodiment, which implement a prefix-match search on three 
20 input characters, and are organized such that a maximum of N=2 
entries are stored in each memory location. 

Fig. 11 illustrates a flow chart for the process of 
searching and parsing according to the second preferred 
embodiment; 

25 Fig. 12 illustrates a flow chart for deriving a 

prioritized list of state-transition rules according to the 
first and second preferred embodiment, each involving a 
ternary match condition on the current state and input value, 
for a given state diagram. 
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Detailed Description of the preferred embodiment 
Programmable state machine: 

In a preferred embodiment of the present invention, there 
is provided a method to perform deep packet processing. The 
5 method comprises the creation of a data structure which 
improves the programmable state machine of the prior art. In 
the data structure, state transitions are represented as a 
list of so called state-transition rules, each containing a 
ternary match condition for the combination of the current. 

10 state and input value, and a next state. With ternary match 
conditions, the matching patterns comprise bits having three 
possible states 0, 1 or X, X being a wild-card symbol for the 
"don't care /; condition. 

If a state- transition rule contains a ternary match 

15 condition that matches a given current state and input, then 
this state-transition rule is said to be matching. If multiple 
state-transition rules can all match the same current state 
and input, then these state- trans it ion rules are assigned 
different priorities. 

20 For a given current state and input value, the next state 

is now determined by the highest-priority state- transition 
rule, matching the current state and input value. 

An example of a list of state-transition rules for the 
state machine in Fig. 1 is (with decreasing priorities) : 



25 



tion 


state 


input 


[ 


state 


input 


] 




next 


state 




1 


S2 


lh 


[ 


010 


0001b 


] 


-> 


S3 


t 011b 


] 


2 


* 


lh 


[ 


XXX 


0001b 


] 


-> 


SI 


[ 001b 


] 


3 


SI 


2h 


[ 


001 


0010b 


] 


-> 


S2 


[ 010b 


] 


4 


S4 


Bh 


[ 


100 


1011b 


] 


-> 


S5 


[ 101b 


] 
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5 * Ah [ xxx 1010b ] -> S4 [ 100b ] 

6 * * [ xxx xxxxb ] -> SO [ 000b ] 

(**' and *x' are wild-card symbols meaning "don't care", 
being used as a wild-card - tor the entire state or the entire 
5 input value, while *x' being used as a wild-card for a single 
bit position) . 

Note that one possible algorithm for generating those 
rules is described later in the document in reference to Pig. 
12. 

10 The next state is now determined by searching the 

highest-priority state- transition rule that matches the 
current state S and input J. For example, if the current state 
is S2 and the input equals 1 then state-transition rule 1 will 
match, indicating that the next state will be S3 . For any 

15 other current state in combination with an input equal to 1, 
state-transition rule 2 will match, resulting in a transition 
to state SI. All state transitions in Fig. 1 are described 
with only 6 state-transition rules. 

If there are only a small number of state-transition 
20 rules, then in the preferred embodiment, these are stored as 
so called state- trans it ion rule entries, abbreviated to 
transition rule entries, in a register or in one memory 
location. This is shown in Fig. 3 for the above 6 
state-transition rules. The ternary match condition of each 
25 transition rule entry is stored as a combination of a (binary) 
test value and a (binary) test mask. The ternary match 
condition will match if the bits of the current state and 
input value equal the bits at the bit positions corresponding 
to the set bits in the test mask. The remaining bit positions 
30 are don't care. For example, the ternary match condition 
M xxx0001" of state-transition rule 2 is stored as a test value 

- -eHS2-0frl-0053 ' — — 10 - * ~ ' " ' 
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0000001b and a test mask 0001111b in the corresponding 
transition rule entry. 

In a preferred embodiment of the present invention, the 
next state is determined by performing a parallel comparison 
of the current state and input character, against the ternary 
match conditions, stored as test values and test masks, in the 
corresponding transition rule entries, in case of multiple 
matches, the matching transition rule entry with the highest 
priority will be selected (in Fig. 3 the entries are stored in 
order of decreasing priority from left to right) . This 
operation is performed for each new 4 -bit input character 
until one of the two end states (S3 or S5) is reached. 



20 



BaRT Compression: 

15 For state machines that have too many states, 

implementation issues (e.g., memory width, timing) can make it 
impossible to store all transition rule entries in one memory 
location or to test all entries in parallel. In this case, the 
preferred embodiment uses the BaRT compression scheme to 
distribute the transition rule entries over multiple memory 
locations. The BaRT compression scheme has been disclosed in a 
conference paper by Jan Van Lunteren, published in the 
proceedings of IEEE Globecom, volume 3, pages 1615-1619, 
November 2001, under the title 'Searching Very Large Routing 
25 Tables in Wide Embedded Memory' . 

The BaRT compression scheme is based on a special hash 
function for exact-, prefix- and ternary-match searches. The 
hash index (which is called compressed index) comprises a 
selected subset of the input bits (in this case the current 
state and input character). These bits are chosen such that 
the number of collisions for each compressed index value is 
bounded by a value N. In other words, for any given value of 
the compressed index, at most N entries can possibly match the 
input. These entries are then stored at the location within 
CH920010053 ±1 
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the hash table (which is called compressed state-transition 
rule table in the preferred embodiment) corresponding to that 
compressed index value. This concept will now be explained by 
applying it upon the example of Fig. 3 for a collision bound 
~~5 WT. 

In Fig. 4, the compressed index consists of the second 
bit of the state register 43 0 and the most significant bit of 
the 4-bit input 440. A method to determine the bits that form 
the compressed index will be discussed later. Because the 

10 compressed index consists of 2 bits, the compressed 
state-transition rule table will contain 4 (2 2 ) blocks, each 
containing at most N=4 transition rule entries. A block is 
illustrated as a row in Fig. 4. Each block is stored in one 
memory location and can be read in one memory access. The base 

15 address for the table is stored in a pointer 410. The 
compressed index can be specified by a so called index mask 
420 which has set bits at the bit locations that correspond to 
the bits that are extracted as compressed index. 

• in Fig. 4, the test value and the test mask fields of the 
20 transition rule entries are combined into one ternary test 
vector field, in order to make the figure more compact and 
understandable. The ternary vectors consisting of *0', »l' f 
and »x' are stored in the state-transition rule table 400 
wherein, for instance, a ternary w xxx 0001" corresponds to a 
25 test value/test mask combination "000 0001/000 1111". 

Now for each value of the compressed index, at most N=4 
transition rule entries can match the current state and input 
character. For example, if the second bit of the current 
state and the most significant bit of the input character 
30 would both equal zero, then only the three transition rule 
entries that are contained within the block corresponding to a 
compressed index value 00b can possibly match the current 
state and input character. All state- trans it ion rule entries, 
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the same ones as shown in Fig. 3, are distributed in this way 
over the various compressed index values (note that certain 
transition rule entries can be matching for multiple 
compressed index values and are therefore stored in more than 
one block) . The bits of the ternary vectors, that are part of 
the compressed index, are underlined within the transition 
rule entries in Fig. 4 for illustrative purposes. 

For a given current state and input character, the next 
state can now be determined in the following way. First, the 
compressed index bits are extracted from the current state and 
input character, based on the index mask 42 0. Next, this 
compressed index is then used to select a block within the 
compressed state-transition rule table that is referred to by 
the pointer 410. The entire block is then read using one 
15 memory access. All transition rule entries in one block are 
then compared in parallel as described before. Also in this 
case, the entries are ordered within a block according to 
decreasing priorities: the next state is taken from the first 
matching transition rule entry (from left to right) . The state 
register is then loaded with the next state from the selected 
matching entry. 

The process of extracting the compressed index, finding 
the highest priority matching entry, and updating the state 
register, is performed for each new input character until one 
of the two end states (S3 or S5) has been reached. 

Index-mask calculation for BaRT compression: 

In a preferred embodiment of the present invention, the 
bits that comprise the compressed index are selected in the 
following way. This will also be called index-mask calculation 
because the index mask uniquely defines the bits that are part 
of the compressed index. 



20 



25 



30 
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if the current state and input character consist together 
of a total of m bits, then there exist a total of 2 ra different 
possible index masks, that each cover all possible ways of 
extracting 0,1, 2, .., and m bits from the current state and 
input character. In a first step, all these i-n^ex—mastes— are" 
determined and ordered by an increasing number of set bits. 
For example, in Fig. 4 the current state and input character 
contain a total of m=7 bits. The 2 7 = 128 possible index masks, 
ordered by increasing number of set bits are: 



10 0000000, 

1000000, 0100000, 0010000, 

1100000, 1010000, 1001000, 

1110000, 1101000, 1100100, 



0000001, 
0000011, 
0000111, 



15 1111110, 1111101, 1111011, 

1111111 



0111111, 



Next, these index masks are processed in the given order. For 
each index mask the maximum number of collisions is determined 
that occurs for all possible compressed index values 
20 corresponding to that index mask, for a given set of 
transition rule entries. The first index mask for which the 
maximum number of collisions does not exceed the given 
collision bound N, is the index mask that will be the result 
of the index-mask calculation. 

25 If an index mask contains k set bits, then the corresponding 
compressed index will consist of a total of k bits, and the 
corresponding compressed state-transition rule table will 
contain 2 fc blocks of (at most) N entries. By testing the index 
masks ordered by increasing number of set bits, the first 

30 index mask found will have a minimum number of set bits, 
resulting in the smallest compressed state-transition rule 
table (i.e., in the highest compression). 
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The above index mask calculation is a brute- force 
approach that tests all possible index masks. The index mask 
calculation can be made faster and simpler by only testing a 
selected subset of index masks. This subset can be selected 
based on the number of transition rule entries in combination 
with the collision bound N. For example, for a collision bound 
N=4 and 32 transition rule entries, the compressed 
state-transition rule table needs to contain at least 8 blocks 
of N=4 entries (8*4=32) to be able to store all transition 
rule entries. In order to index 8 blocks, the compressed index 
needs to consist of at least 3 bits (2 3 = 8) . Consequently, 
only index masks need to be tested that contain at least 3 set 
bits . 

For those skilled in the art, parallelism available in 
hardware implementations can be used to realize an index-mask 
calculation that can determine an optimum index mask in a time 
that is linear with the number of transition rule entries. 

Improved Programmable State Machine: 



In a second embodiment of the invention, the deep packet 
processing can be improved. Each transition rule entry as 
illustrated in Fig. 4 is extended with an index mask and 
pointer field. Such an entry is shown in Fig. 5. This table 
entry can be used to implement a programmable state machine 
for parsing as well as to implement a tree-like structure for 
searching . 



In the aforementioned first preferred embodiment, there 
is a state register comprising of at least log(s) bits to 
implement a programmable state machine with s states. The 
extended transition rule entry allows to support programmable 
state machines using a smaller state register that has a fixed 
number of bits independent of the number of states. This 
CH920010053 15 



allows a more efficient support of state machines with large 
numbers of states. This will now be illustrated using the 
example of the state machine in Pig. 1 in order to obtain the 
data structure shown in Fig. 6. The various steps are j 
"5 deS^rib^d^by^'h^^ 

As a first step 700 in the second preferred embodiment, 
all states are distributed over smaller state spaces and 
assigned state vectors that are unique within each state 
space. In this example, the 6 states in Fig. 1 are distributed | 
10 over 2 state spaces in the following way with the following j 
state-vector assignment: ) 

State space 2 : • 

52 - 00 \ 

53 - 01 

54 - 10 i 

i 

Note that this distribution can be done in any arbitrary way. 
However, a typical objective of the distribution results from 
the size of the state register. If, for example, 
implementation issues or other reasons, result in a state 
20 register consisting of k bits, then the states should be 
distributed over multiple state spaces such that each state 
space contains at most 2 k . In this situation, unique 
state-vectors of at most k bits can be assigned to each state 
within a state space, which will fit in the state register. 

25 After applying the above distribution of the states over 

two state spaces, the original state-transition rules can now 
be written as follows 710: 



State sr>ace 1 

50 - 00 

51 ~ 01 
15 S5 - 10 



State space 1: 

transition state input [ state input ] next state 

30 1 * 1 [ xx 0001 ] -> SI [ 01b - state space 1 ] 

~ —eH9 20-01-0053 - - - -1-6 - - - - 
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15 



3 
4 



SI 

* 

* 



2 
A 
* 



[ 01 0010 ] -> S2 
[ xx 1010 ] -> S4 
t xx xxxx ] -> SO 



t 00b - state space 2 ] 
[ 10b - state space 2 ] 
[ 00b - state space 1 ] 



State space 2 : 

transition state input [ state input] next state 

[ 00 0001 ] ~> S3 [ 01b 

[ xx 0001 ] -> si [ 01b 

[ 10 1011 ] -> S5 [ 10b 

[ xx 1010 ] -> S4 [ 10b 

t xx xxxx ] -> SO [ 00b 



1 
2 
3 
4 
5 



S2 

S4 
* 



1 

1 

B 

A 
* 



state space 2 ] 
state space 1 ] 
state space 1 ] 
state space 2 ] 
state space 1 ] 



In the next step 720, each state space is implemented using a 
compressed state-transition rule table in the same way as with 
the first preferred embodiment. The difference with the first 
preferred embodiment, is that now the index mask and pointer 
-associated" with the state space of which the next state is 
part of, are stored together with the next state in the 
extended transition rule entries. 



20 



25 



Fig. 6 shows the resulting structure for an 
implementation in which the memory width allows to store 4 
transition rule entries in one location and compare those in 
parallel. The compressed state-transition rule table 610, 
corresponding to state space 1, consists of one block of four 
entries. The compressed state- transition rule table 600, 
corresponding to state space 2, consists of two entry-blocks, 
which are selected using a one-bit compressed index 620 which 
is extracted from the current state register 63 0 and 4-bit 
input character 640. The two base pointers 650 and 660 
corresponding to the two compressed tables are denoted as SP1 
and SP2, respectively. 



Pig. 8 and Fig. 9 show the flow charts describing the 
creation of a compressed state-transition rule table for each 
state space 720. This will now be explained for the second 
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state space, which involves 5 entries with the following test 
vectors : 

entry 1 00 0001 

entry 2 xx 0001 

~5 entry ~3 10 101T 

entry 4 xx 1010 

entry 5 xx xxxx 

The maximum number of entries per entry-block equals N=4 
800, 900. The number of transition rules equals 5, which is 

10 larger than the maximum number of entries per entry-block 
(answer Yes to test 810) , therefore the compressed table will 
contain multiple entry-blocks, that have to be indexed by a 
compressed index. For this purpose, an index mask must be 
calculated 830 in the following way. First all possible index 

15 masks are determined and ordered by an increasing number of 
set bits 910. Fewer set bits correspond to a smaller 
compressed index, fewer entry blocks and therefore a better 
compression. There exist a total of 63 (2 6 -l) possible non-zero 
values of a 6-bit vector (2 bits state + 4 bits input) , which 

20 are, ordered according to an increasing number of set bits: 

100000b 
010000b 
001000b 
000100b 
25 000010b 
000001b 
110000b 
101000b 
100100b 

30 

111111b 

The first index mask is 100000b 920. To this index mask 
correspond two possible compressed index values, namely 0b and 
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lb 930. For this index mask the entries will be mapped on the 
two compressed index values in the following way 940 : 

Ob: entries 1, 2, 4, 5 
lb: entries 2, 3, 4, 5 

(this mapping is obtained by taking the left-most bit of the 
test vector: in case of a Ob, the entry is mapped on index Ob, 
in case of a lb, the entry is mapped on index lb, in case of a 
xb, the entry is mapped on both index Ob and index lb) . 

The maximum number of entries mapped on one compressed 
index value equals M=4 950. Because M <= N (smaller or equal), 
this index mask is selected (answer Yes to test 960) . If M > n 
(answer No to test 960), the next index mask is selected 
(970) . 



The number of set bits in index mask 100000b equals k=l 
840. Consequently, the table consists of 2 k = 2 block entries 
(corresponding to both compressed index values) each 
containing N=4 entries. Therefore the table contains a total 
of 2*4 = 8 entries. After sufficient memory has been allocated 
850, the entries can be written within each of the two blocks 
(as shown above: entries 1,2,4,5 on compressed index value 0b, 
and entries 2,3,4,5 on compressed index value lb) ordered by 
decreasing priority 860. The same procedure is used to 
construct a compressed table for state space 1. In this case, 
the number of transition rules is not larger than the number 
of entries in one entry block (N=4) ; answer No to test 810. 
Consequently, the index mask equals 000000b 820 and the 
compressed table consists of only one entry block 840, 850, 
860. After both tables have been constructed, the index 
mask/pointer combination for each table can be written in the 
corresponding fields within the entries involving next states 
in the corresponding state spaces 870, 730. 
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In practical implementations, the number of entries is 
typically much greater than with the state diagram used to 
illustrate those various embodiments of the present invention 
presented herein. 



5 Searching: 

Fig. 10 illustrates an example of a data structure 
according to the aforementioned second preferred embodiment, 
that can be used for searching. The state register and the 
next state fields in the transition rule entries within the 
10 compressed tables are not used in this search, and the two 
left-most bits within the test vectors that correspond to the 
state register equal xxb (don't care) . 

The data structure implements a prefix search on a 12-bit 
vector consisting of a first 1000, second 1010, and third 1020 
15 4-bit input value as shown in Fig. 10. The prefixes and 
corresponding search results are: 

prefix prefix length search result 

prefix 1: 0001 0010 0011b (123h) 12 P 
prefix 2: 0101b (5h) 4 Q 

20 prefix 3: 1010b (Ah) 4 R 

The data structure for this example consists of three 
compressed tables, that are each "indexed" by one of the input 
values. The compressed table that is indexed by the first 
input value, is used to determine whether the above prefixes 

25 might match the input, based on the first 4 bits of those 
prefixes, which are 0001b (prefix 1), 0101b (prefix 2), and 
1010b (prefix 3) . The test vectors corresponding to those 
prefixes are: xx 0001b (prefix 1) 

xx 0101b (prefix 2) 

30 xx 1010b (prefix 3) 
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(the bit positions related to the state register (1030) 
contain xxb as mentioned above). 0 

In this example, the number of entries per entry-block is 
chosen to be equal to N=2 . For this value of N and the above 
5 test vectors, the compressed index computed according to the 
BaRT algorithm consists of one bit 1070, and the entries 
related to prefix 1 and prefix 2 are mapped on a compressed 
index value of Ob, and the entry related to prefix 3 is mapped 
on a compressed, index value lb. 

10 Both prefix 2 and prefix 3 have a length equal to 4. 

Consequently, if the test vector stored in the entry 
corresponding to each of those prefixes, would match the first 
4-bit input value, then this means that the corresponding 
prefix matches the input character, in that case the search 
result can be retrieved from the pointer field of the matching 
entry, if the first input value equals <5'h, then the result 
will be Q. if the first input value equals <A'h, then the 
result will be R. 
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20 



25 



Prefix 1 has a length equal to 12. This implies that all 
three input values 1000, 1010, 102 0 have to be tested in order 
to determine whether this prefix is matching. This is 
implemented in the following way in the data structure in 
Pig. 10. The first input value 1000 is tested by the 
compressed table 1040. If the first input value would match 
the test vector xx 0001b that is stored in the entry related 
to prefix 1, the left-most entry in the entry-block 
corresponding to a compressed index value 0b, then the index 
mask and pointer fields of this entry are retrieved and used 
to access a second compressed table 1050, which tests the 
30 second input value 1010. The only valid entry in this table 
contains a test vector starting with xxb (for the state 
register) followed by the second group of four bits of 
prefix 1, resulting in xx 0010b. If this test vector matches 
CH920010053 21 




the second input value 1010, then the index mask and pointer 
fields of this entry are retrieved and used to access a third 
compressed table 1060, which tests the third input value 1020. 
The only valid entry in this table contains a test vector 

~~5 starting with xxjd — ffor the— state— register) — fol-l-owed— by— fefee- 
third group of four bits of prefix 1, resulting in xx 0011b. 
If this test vector matches the third input value 1020, this 
means that prefix 1 is matching the given set of three input 
values. In that case, the search result can be retrieved from 

10 the pointer field of the matching entry. If the three 4-bit 
input characters ec^ual *123'h, then the result will be P. 

Parse and Search operation: 

The flow chart of Fig. 11 illustrates an example of a 
method for parsing and searching according to the second 

15 preferred embodiment. The first step 1100 is to initialize the 
state register, the current index mask and the current base 
pointer with values that correspond to the first compressed 
table involved in the parse or search operation. Next step 
1110 is to select the first input character to be analyzed. 

20 The next step 1120 is to extract the compressed index value 
from the input data and state register, based on the current 
index mask, and to use this to select an entry-block within 
the compressed table that is referred to by the current base 
pointer. In the next step 1130 a matching entry is searched 

25 within the selected entry-block, by comparing the test vector 
in each entry against the state register and input character. 
The first matching entry found is selected. The operation ends 
if no match is found: answer No to test 1140. If a match is 
found; answer Yes to test 1140, there is a first case where 

30 the entry read is final: answer Yes to test 1150. For parsing, 
this means that the end- state of the state machine has been 
reached; S3 and S5 in the example of Fig. 1. For searching, 
this means that a result has been found (as an example, R for 
the input value of *A'h in the example of Fig. 10) . A final 

35 entry can be identified in many ways, for example, using a 
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flag bit or a special value for the pointer. In this case, if 
a final entry is .found then the operation ends. If the 
matching entry is not a final entry; answer No to test 1150, 
then new values for the current state register, current index 
mask and current based pointer are extracted from the matching 
entry and become the current values 1160 and a new 4-bit 
character value is selected 1170. Based on the new values, the 
loop starting with 1120 is entered again. This loop is 
executed until no matching entry is found, answer No to test 
1140, or a final entry is found to be matching, answer Yes to 
test 1150. 
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Algorithm for deriving prioritized state-transition rules 

Fig. 12 shows a flow chart describing an example of a 
method for deriving a prioritized list with state-transition 
rules, which each involve a ternary match condition on the 
5 current state and input value. This method may be used for 
building the data structure supporting the programmable state 
machine of the embodiments of the invention. This algorithm 
will be explained using the example of the state diagram shown 
in Figure 1. The state transitions for this state diagram are: 



state 


input 




next state 


SO 


0, 2-9, B..F 


-> 


SO 


SI 


0, 3..9, B..F 


-> 


SO 


S2 


0, 2..9, B..F 


-> 


SO 


S4 


0, 2..9, C.JF 


-> 


SO 


SO 


1 


-> 


SI 


SI 


1 


-> 


SI 


S4 


1 


-> 


SI 


SI 


2 


-> 


S2 


S2 


1 


-> 


S3 


SO 


A 


-> 


S4 


SI 


A 


-> 


S4 


S2 


A 


-> 


S4 


S4 


A 


-> 


S4 


S4 


B 


-> 


S5 



25 An input value i that has not been processed is selected 1200. 
Next, for this input value i, the most frequently occurring 
next state s is determined 1210, and all transitions with 
input value i to that next state s are replaced 1220 by one 
transition rule u * i -> s " with a priority 1. All 

30 transitions that involve the same input i but a different next 
state than s, are assigned a priority 2, 1230. These steps are 
repeated, answer No to test 1240, until all the input values 
are processed: answer yes to test 1240. 
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The loop execution results in: 



state 


input 




next state 


priority 




0 


-> 


SO 


I 


* 


1 


-> 


SI 


1 


# 


2 


-> 


SO 


i 


* 


3 


-> 


so 


i 


* 


4 


-> 


so 


i 


* 


9 


-> 


so 


i 


* 


A 


-> 


S4 


i 




B 


-> 


so 


i 






-> 






* 


F 


-> 


so 


i 


SI 


2 


-> 


S2 


2 


S2 


1 


-> 


S3 


2 


S4 


B 


-> 


S5 


2 



The next step 1250 is now to determine the most 
frequently occurring next state s within transition rules with 
priority 1, that does not occur in any transition rule with 
priority 2. If such a state does not exist the transition rule 
list is completed: answer No to test 1260, the method ends. 
Such a state exists in the example, namely SO: answer Yes to 
test 1260. All transition rules with priority 1 involving a 
next state SO are now replaced by a default transition rule 
* * -> SO with priority 0, 1270. This results in: 

state input next state priority 

* * -> SO 0 

* 1 -> SI 1 

* A -> S4 1 

51 2 -> S2 2 

52 1 -> S3 2 
S4 B -> S5 2 
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1. A method for creating the data structure of a 
programmable state machine to parse an input word chain by 
identifying a word pattern within said input word chain to 
point to a resulting address, said method comprising the steps 
of: 

- creating a state table corresponding to states of said 
programmable state machine for identifying the word pattern in 
the input word chain, each state table entry comprising a 
s-bit current state, an input n-bit word and a s-bit next 
state; 

- reducing the number of entries in the state table by 
converting the entries into a reduced number of 
state-transition rule entries, each containing a ternary match 
condition expressed as a test value comprising the s-bit 
current state, the input n-bit word, a test mask on the s-bit 
current state and the input n-bit word in combination, and the 
s-bit next state; and, 

- ordering the reduced state table entries obtained by the 
execution of the preceding step, in a prioritized order, with 
most frequently used transition rules having the highest 
priority. 

2. The method of claim 1 further comprising the steps of: 

- defining as a hash index, for the reduced state table, a set 
of i bit locations inside the s-bit current state and the 
input n-bit word in combination, and an integer N, such that, 
at most, N table entries can match a hash index value; 

- creating a compressed state table, indexed by the hash 
index, having 2* entries, each entry corresponding to one value 
of the hash index, and each having a maximum of N transition 
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rules of the reduced state table corresponding to the same 
hash index value and written in a priority order; and, 

- saving an s + n bit index mask corresponding to the hash 
index, said index mask having bits coded to a first binary 

5 value — except — for — eKe — Hash — index — bit — tocat-ions — coded — to — a- 

second binary value and saving a base address pointer (SP1, 
SP2) of the compressed space table. 

3. The method of claim 1 or 2, further comprising the step 
of : 

10 - dividing the compressed state- transit ion rule table into 
more than one qompressed state- transition rule table; and, 

- extending in each of the divided compressed state tables, 
each state-transition rule with the index mask and a base 
address pointer of the divided compressed table of the next 

15 state in said state- transition rule. 

4. The method for parsing an input word chain using a data 
structure of a programmable state machine created according to 
anyone of claims 1 to 3 , said method comprising the steps of: 

- initializing the current state, a current index bit mask of 
20 the data structure and a current base pointer; 

- defining the first word of the input word chain as being the 
current input ; 

- extracting the hash index from the current state and current 
input according to the index mask; 

25 - searching in the space table indicated by the current base 
pointer, the entry corresponding to the hash index and 
searching for the state-transition rule matching the current 
state current input, if multiple transition rules match, 
selecting one with the highest priority; 

30 - if the next state is not a final state, extracting the new 
values for the current state, the current index mask of the 
data structure and the current base pointer; 

- defining the next word of the input word chain as being the 
current input; 
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- repeating the preceding extracting hash index, searching and 
extracting new values until the next read state is final. 

5. The method for searching a resulting value corresponding 
to an input word chain using a data structure of a 
programmable state machine created according to anyone of 
claims 1 to 3 wherein the compressed tables have been built 
using a hash index taken among the n bits of the word input, 
the two left-most bits within the test vectors, that 
correspond to the state register, are unused, the next state 
bits of each transition rule are unused and the base address 
pointer of the transition rules may include a final result (P, 
R, Q) , said method comprising the steps of: 

- defining the first word of the input word chain as being the 
current input; 

- extracting the hash index from the current input according 
to the index mask; 

- searching in the space table indicated by the current, base 
pointer, the entry corresponding to the hash index and 
searching for the state-transition rule matching the current 
state current input, if multiple transition rules match, 
selecting one with the highest priority; 

- reading the base address pointer field, and, if it does not 
include a final result, repeating the following steps until 
the base address pointer field includes the final result; 

- defining the next word of the input word chain as being the 
current input; 

- checking that the next word maps the test mask of the 
transition rule pointed by the read base address pointer; 

- reading the base address pointer field. 

6. A method for performing wire-speed deep packet processing 
comprising the steps of, upon reception of an input packet 
consisting in a variable chain of words, repeating 
alternatively the steps of the method of claim 4 and the steps 
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of the method of claim 5 until all the words have been 
processed. 



7 . A method for performing wire-speed deep packet processing 
comprising the steps o£~, upon reception o~£ an input packet 

5 consisting in a variable chain of words, combining in an order 
corresponding to the deep packet processing the set of steps 
of the method of claim 4 with the set of steps of the method 
of claim 5 until all the words have been processed. 

8 . An apparatus for deep packet processing comprising means 
10 adapted for implementing the steps of the method according to 

anyone of claims 1 to 7 . 

9 . A chip embedded apparatus comprising means adapted for 
implementing the steps of the method according to anyone of 
claims 1 to 7 . 

15 10 . A computer program product comprising programming code 
instructions for executing the steps of the method according 
to anyone of claims 1 to 7 when said program is executed on a 
computer . 
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A METHOD AND APPARATUS FOR DEEP PACKET PROCESSING 

Abstract 

A method and apparatus for deep packet processing 
including a parsing and a searching method supported by a data 
5 structure storing the state-transition rules in the 
state-transition rule tables of a programmable state machine 
for parsing. 

The state-transition rule table is then compressed using 
the BaRT compression algorithm. Each transition rule comprises 

10 a test value, a test mask and a next state field. 

In a second embodiment the state-transition rule table is 
split into more than one state-transition rule table 
corresponding to disjoints state spaces, thus allowing more 
flexibility in the use of storage space. 

15 Finally a parsing and searching method can be implemented 

using the same hardware. The searching and parsing methods can 
be implemented alternatively or in any combination at 
wire- speed. 

Fig. 6 
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c 



start 



N: = maximum number of 
entries per entry block 



^800- 



No 



820 

\ 



810 

number of 
transition rules > N 



Yes 



index mask: = 0 



8^0 



calculate index mask for the 
transition rules that relate to 
the current state space 



k: = the number of set bits in 
the index mask 



-840 



allocate memory for the new table with a total of 
2* entry blocks, each containing N entries, 

store the pointer to the table 



50 



60 



determine for each transition rule on which entry blocks it 
is mapped for the calculated index mask, and write the 
corresponding entries within these entry blocks, such that the 
entries within one entry block are ordered by decreasing priority 



70 



write the calculated index mask and the pointer to the 
table in the index mask and pointer fields of the entries with a 
next state in the current state space (those fields will be written in 
a later stage for entries with a next state in a different state space) 
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